Cooperation-eliciting prisoner's dilemma payoffs for reinforcement learning agents

نویسندگان

  • Koichi Moriyama
  • Satoshi Kurihara
  • Masayuki Numao
چکیده

This work considers a stateless Q-learning agent in iterated Prisoner’s Dilemma (PD). We have already given a condition of PD payoffs and Q-learning parameters that helps stateless Q-learning agents cooperate with each other [2]. That condition, however, has a restrictive premise. This work relaxes the premise and shows a new payoff condition for mutual cooperation. After that, we derive the payoff relations that will elicit mutual cooperation from the new condition.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mood modelling within reinforcement learning

Simulating mood within a decision making process has been shown to allow cooperation to occur within the Prisoner’s Dilemma. In this paper we propose how to integrate a mood model into the classical reinforcement learning algorithm Sarsa, and show how this addition can allow self-interested agents to be successful within a multi agent environment. The human-inspired moody agent will learn to co...

متن کامل

The Speed of Learning in Noisy Games: Partial Reinforcement and the Sustainability of Cooperation

In an experiment, players’ ability to learn to cooperate in the repeated prisoner’s dilemma was substantially diminished when the payoffs were noisy, even though players could monitor one another’s past actions perfectly. In contrast, in one-time play against a succession of opponents, noisy payoffs increased cooperation, by slowing the rate at which cooperation decays. These observations are c...

متن کامل

Multiagent Reinforcement Learning with Spiking and Non-Spiking Agents in the Iterated Prisoner's Dilemma

This paper investigates Multiagent Reinforcement Learning (MARL) in a general-sum game where the payoffs’ structure is such that the agents are required to exploit each other in a way that benefits all agents. The contradictory nature of these games makes their study in multiagent systems quite challenging. In particular, we investigate MARL with spiking and non-spiking agents in the Iterated P...

متن کامل

Multiagent reinforcement learning in the Iterated Prisoner's Dilemma.

Reinforcement learning (RL) is based on the idea that the tendency to produce an action should be strengthened (reinforced) if it produces favorable results, and weakened if it produces unfavorable results. Q-learning is a recent RL algorithm that does not need a model of its environment and can be used on-line. Therefore, it is well suited for use in repeated games against an unknown opponent....

متن کامل

Backward vs. Forward-Oriented Decision Making in the Iterated Prisoner's Dilemma: A Comparison Between Two Connectionist Models

We compare the performance of two connectionist models developed to model specific aspects of the decision making process in the Iterated Prisoner’s Dilemma Game. Both models are based on common recurrent network architecture. The first of them uses a backward-oriented reinforcement learning algorithm for learning to play the game while the second one makes its move decisions based on generated...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014